Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.
نویسندگان
چکیده
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
منابع مشابه
Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins.
Proteins that interact with DNA are involved in a number of fundamental biological activities such as DNA replication, transcription, and repair. A reliable identification of DNA-binding sites in DNA-binding proteins is important for functional annotation, site-directed mutagenesis, and modeling protein-DNA interactions. We apply Support Vector Machine (SVM), a supervised pattern recognition me...
متن کاملIdentifying DNA-binding proteins using structural motifs and the electrostatic potential.
Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix-turn-helix (HTH...
متن کاملAnnotating nucleic acid-binding function based on protein structure.
Many of the targets of structural genomics will be proteins with little or no structural similarity to those currently in the database. Therefore, novel function prediction methods that do not rely on sequence or fold similarity to other known proteins are needed. We present an automated approach to predict nucleic-acid-binding (NA-binding) proteins, specifically DNA-binding proteins. The metho...
متن کاملIDENTIFICATION, ISOLATION, CLONING AND SEQUENCING APARTIALANNEXIN GENE FROM AUREOBASIDIUM PULLULANS
Background and Objectives: Annexin is the common name for genes and proteins that were identified as calcium-dependent phospholipid-binding proteins. Recently a more complex set of functions has been recognized for this superfamily of proteins including in vesicle trafficking, cell division, apoptosis, calcium signalling, mineralization, crystal nucleation inside the extracellular organelle...
متن کاملiDBPs: a web server for the identification of DNA binding proteins
SUMMARY The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of molecular biology
دوره 387 4 شماره
صفحات -
تاریخ انتشار 2009